Reverse Code Engineering RCE CD +sandman 2000

home *** CD-ROM | disk | FTP | other *** search

/ Reverse Code Engineering RCE CD +sandman 2000 / ReverseCodeEngineeringRceCdsandman2000.iso / RCE / Library / Assembly Programming Journal / apj_4.txt < prev next >

Wrap

Text File | 2000-05-25 | 49.5 KB | 1,205 lines

::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. Apr-June 99 :::\_____\::::::::::. Issue 4 ::::::::::::::::::::::......................................................... A S S E M B L Y P R O G R A M M I N G J O U R N A L http://asmjournal.freeservers.com asmjournal@mailcity.com T A B L E O F C O N T E N T S ---------------------------------------------------------------------- Introduction...................................................mammon_ "Using COM in Assembly Language"..........................Lord.Lucifer "Stack Frames and High-Level Calls"............................mammon_ "Define Your Memory".......................................Alan Baylis "Writing a Boot Sector in A86"...........................Jan Verhoeven "A Basic Virus Writing Primer"...................................Chili Column: Win32 Assembly Programming "Mouse Input....".........................................Iczelion "Menus"...................................................Iczelion Column: The C standard library in Assembly "C string functions:_strtok"................................Xbios2 Column: The Unix World "Using Menus in Xt"........................................mammon_ Column: Assembly Language Snippets "Triple XOR".........................................Jan Verhoeven "Trailing Calls".....................................Jan Verhoeven Column: Issue Solution "Fire Demo"....................................................iCE ---------------------------------------------------------------------- +++++++++++++++++++++Issue Challenge+++++++++++++++++++ Write a "Fire Demo"-style program in less than 100 bytes ---------------------------------------------------------------------- ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::..............................................INTRODUCTION by mammon_ In the last few months I have come across a number of links to APJ, and have received the proverbial ton of email regarding it. Strangely enough, the majority of these tend to agree that the one problem with the journal is its infrequent --if not irregular-- publication. If that is the only complaint so far, I think I can cope with it ;) This issue is, naturally, very late due to what could be called "real world" [lit., "that which does not go away when a power outtage kills your PC"] considerations; however the articles by weight alone should make up for some of this. The largest of the bunch is undoubtedly the virus writing tutorial by Chili, who may have beat my previous record for article length: a very thorough work, worth reading just to help protect against virii, if not to write them. This is accompanied by Jan's discussion of boot sector programming...a suitable companion article, I believe. High-level coders will undoubtedly be interested in Lord Lucifer's article on COM programming in assembly; it seems that high-level areas such as COM, DirectDraw, and Winsock coding are starting to receive a fair degree of attention from the assembly language world, judging from the tutorials I have been coming across. Xbios2 has continued his excellent C stdlib work, and Icezlion has contributed two more of his now-legendary Win32 asm tutorials; I of course have kept up the Unix vanguard with yet another Xt article. This month's challenge was contributed by iCE, and had a .text-size I could not readily beat. A few brief notes concerning the web page: I have thrown together a basic collection of assembly language links at http://asmjournal.freeservers.com/lynx.html Submissions for this links page are welcome. I have also been getting a few emails to the APJ inbox asking or offering help with assembly language; since I check the inbox fortnightly at best, I have added a "classified ads" page to the APJ website at http://www.guestbook4free.com/en/28806/entries/ which is essentially a guestbook where people can post contact info, projects they need help with, etc ... more or less a one-way bulletin board like, well, like classified ads are. That should just about wrap things up. Enjoy the issue! _m ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Using COM in Assembly Language by Lord Lucifer This article will discuss how to use COM interfaces in your assembly language programs. It will not discuss what COM is and how it is used, but rather how it can be used when programming in assembler. It will discuss only how to use existing interfaces, and not how to actually implement new ones; this will be shown in a future atricle. About COM ------------------------------------------------------------------------------ Here is a brief introduction to the basics behind COM. A COM object is one in which access to an object's data is achieved exclusively through one or more sets of related functions. These function sets are called interfaces, and the functions of an interface are called methods. COM requires that the only way to gain access to the methods of an interface is through a pointer to the interface. An interface is actually a contract that consists of a group of related function prototypes whose usage is defined but whose implementation is not. An interface definition specifies the interface's member functions, called methods, their return types, the number and types of their parameters, and what they must do. There is no implementation associated with an interface. An interface implementation is the code a programmer supplies to carry out the actions specified in an interface definition. An instance of an interface implementation is actually a pointer to an array of pointers to methods (a function table that refers to an implementation of all of the methods specified in the interface). Any code that has a pointer through which it can access the array can call the methods in that interface. Using a COM object assembly language ------------------------------------------------------------------------------- Access to a COM object occurs through a pointer. This pointer points to a table of function pointers in memory, called a virtual function table, or vtable in short. This vtable contains the addresses of each of the objects methods. To call a method, you indirectly call it through this pointer table. Here is an example of a C++ interface, and how its methods are called: interface IInterface { HRESULT QueryInterface( REFIID iid, void ** ppvObject ); ULONG AddRef(); ULONG Release(); Function1( INT param1, INT param2); Function2( INT param1 ); } // calling the Function1 method pObject->Function1( 0, 0); Now here is how the same functionality can be implemented using assembly language: ; defining the interface ; each of these values are offsets in the vtable QueryInterface equ 0h AddRef equ 4h Release equ 8h Function1 equ 0Ch Function2 equ 10h ; calling the Function1 method in asm ; the method is called by obtaining the address of the objects ; vtable and then calling the function addressed by the proper ; offset in the table push param2 push param1 mov eax, pObject push eax mov eax, [eax] call [eax + Function1] You can see this is somewhat different than calling a function normally. Here, pObject points to the Interface's vTable. At the Function1(0Ch) offset in this table is a pointer to the actual function we wish to call. Using HRESULT's ------------------------------------------------------------------------------- The return value of OLE APIs and methods is an HRESULT. This is not a handle to anything, but is merely a 32-bit value with several fields encoded in the value. The parts of an HRESULT are shown below. HRESULTs are 32 bit values layed out as follows: 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +-+-+-+-+-+---------------------+-------------------------------+ |S|R|C|N|r| Facility | Code | +-+-+-+-+-+---------------------+-------------------------------+ S - Severity Bit Used to indicate success or failure 0 - Success 1 - Fail By noting that this bit is actually the sign bit of the 32-bit value, checking success/failure is simply performed by checking its sign: call ComFunction ; call the function test eax,eax ; now check its return value js error ; jump if signed (meaning error returned) ; success, so continue R - reserved portion of the facility code, corresponds to NT's second severity bit. C - reserved portion of the facility code, corresponds to NT's C field. N - reserved portion of the facility code. Used to indicate a mapped NT status value. r - reserved portion of the facility code. Reserved for internal use. Used to indicate HRESULT values that are not status values, but are instead message ids for display strings. Facility - is the facility code FACILITY_WINDOWS = 8 FACILITY_STORAGE = 3 FACILITY_RPC = 1 FACILITY_WIN32 = 7 FACILITY_CONTROL = 10 FACILITY_NULL = 0 FACILITY_ITF = 4 FACILITY_DISPATCH = 2 To retreive the Facility, call ComFunction ; call the function shr eax, 16 ; shift the HRESULT to the right by 16 bits and eax, 1FFFh ; mask the bits, so only the facility remains ; eax now contains the HRESULT's Facility code Code - is the facility's status code To get the Facility's status code, call ComFunction ; call the function and eax, 0000FFFFh ; mask out the upper 16 bits ; eax now contains the HRESULT's Facility's status code Using COM with MASM ------------------------------------------------------------------------------ If you use MASM to assemble your programs, you can use some of its capabilities to make calling COM functions very easy. Using invoke, you can make COM calls look almost as clean as regular calls, plus you can add type checking to each function. Defining the interface: IInterface_Function1Proto typedef proto :DWORD IInterface_Function2Proto typedef proto :DWORD, :DWORD IInterface_Function1 typedef ptr IInterface_Function1Proto IInterface_Function2 typedef ptr IInterface_Function2Proto IInterface struct DWORD QueryInterface IUnknown_QueryInterface ? AddRef IUnknown_AddRef ? Release IUnknown_Release ? Function1 IInterface_Function1 ? Function2 Interface_Function2 ? IInterface ends Using the interface to call COM functions: mov eax, pObject mov eax, [eax] invoke (IInterface [eax]).Function1, 0, 0 As you can see, the syntax may seem a bit strange, but it allows for a simple method using the function name itself instead of offsets, as well as type checking. A Sample program written using COM ------------------------------------------------------------------------------ Here is some sample source code which uses COM written in straight assembly language, so it should be compatable with any assembler you prefer with only minor changes necessary. This program uses the Windows Shell Interfaces to show the contents of the Desktop folder in a window. The program is not complete, but shows how the COM library is initialized, de-initialized, and used. I also shows how the shell library is used to get folders and obcets, and how to perform actions on them. ..386 ..model flat, stdcall include windows.inc ; include the standard windows header include shlobj.inc ; this include file contains the shell namespace ; definitions and constants ;---------------------------------------------------------- ..data wMsg MSG <?> g_hInstance dd ? g_pShellMalloc dd ? pshf dd ? ; shell folder object peidl dd ? ; enum id list object lvi LV_ITEM <?> iCount dd ? strret STRRET <?> shfi SHFILEINFO <?> ... ;---------------------------------------------------------- ..code ; Entry Point start: push 0h call GetModuleHandle mov g_hInstance,eax call InitCommonControls ; initialize the Component Object Model(COM) library ; this function must be called before any COM functions are called push 0 call CoInitialize test eax,eax ; error when the MSB = 1 ; (MSB = the sign bit) js exit ; js = jump if signed ; Get the Shells IMalloc object pointer, and save it to a global variable push offset g_pShellMalloc call SHGetMalloc cmp eax, E_FAIL jz shutdown ; here we would set up the windows, list view, message loop, and so on.... ; we would also call the FillListView procedure... ; .... ; Cleanup ; Release IMalloc Object pointer mov eax, g_pShellMalloc push eax mov eax, [eax] call [eax + Release] ; g_pShellMalloc->Release(); shutdown: ; close the COM library call CoUninitialize exit: push wMsg.wParam call ExitProcess ; Program Terminates Here ;---------------------------------------------------------- FillListView proc ; get the desktop shell folder, saved to pshf push offset pshf call SHGetDesktopFolder ; get the objects of the desktop folder using the EnumObjects method of ; the desktop's shell folder object push offset peidl push SHCONTF_NONFOLDERS push 0 mov eax, pshf push eax mov eax, [eax] call [eax + EnumObjects] ; now loop through the enum id list idlist_loop: ; Get next id list item push 0 push offset pidl push 1 mov eax, peidl push eax mov eax, [eax] call [eax + Next] test eax,eax jnz idlist_endloop mov lvi.imask, LVIF_TEXT or LVIF_IMAGE mov lvi.iItem, ; Get the item's name by using the GetDisplayNameOf method push offset strret push SHGDN_NORMAL push offset pidl mov eax, pshf push eax mov eax, [eax] call [eax + GetDisplayNameOf] ; GetDisplayNameOf returns the name in 1 of 3 forms, so get the correct ; form and act accordingly cmp strret.uType, STRRET_CSTR je strret_cstr cmp strret.uType, STRRET_OFFSET je strret_offset strret_olestr: ; here you could use WideCharToMultiByte to get the string, ; I have left it out because I am lazy jmp strret_end strret_cstr: lea eax, strret.cStr jmp strret_end strret_offset: mov eax, pidl add eax, strret.uOffset strret_end: mov lvi.pszText, eax ; Get the items icon push SHGFI_PIDL or SHGFI_SYSICONINDEX or SHGFI_SMALLICON or SHGFI_ICON push sizeof SHFILEINFO push offset shfi push 0 push pidl call SHGetFileInfo mov eax, shfi.iIcon mov lvi.iImage, eax ; now add item to the list push offset lvi push 0 push LVM_INSERTITEM push hWndListView call SendMessage ; repeat the loop idlist_endloop: ; now free the enum id list ; Remember all allocated objects must be released... mov eax, peidl push eax mov eax,[eax] call [eax + Release] ; free the desktop shell folder object mov eax, pshf push eax mov eax,[eax] call [eax + Release] ret FillListView endp END start Conclusion ------------------------------------------------------------------------------- Well, that is about it for using COM with assembly language. Hopefully, my next article will go into how to define your own interfaces. As you can see, using COM is not difficult at all, and with it you can add a very powerful capability to your assembly language programs. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Stack Frames and High-Level Calls by mammon_ Last month I covered how to implement high-level calls in Nasm. Since then it has come to my attention that many beginning programmers are unfamiliar with calling conventions and the stack frame; to remedy this I have prepared a brief discussion of these topics. The CALL Instruction -------------------- At its most basic, an assembly language call takes this for: push [parameters] call [address] Some assemblers will require that the CALL statement take as an rgument only addresses leading to external functions or addresses created with a macro or directive such as PROC. However, as a quick glance through a debugger or a passing familiarity with Nasm will demonstrate, the CALL instruction simply jumps to an address [often a label in the source code] while pushing the contents of EIP [containing the address of the instruction following the call] onto the stack. The CALL instruction is therefore equivalent to the following code: push EIP jmp [address] The address that has been called will thefore have the stack set up as follows: [Last Parameter Pushed]: DWORD [Address of Caller] : DWORD --- "Top" of Stack [esp] --- At this point, anything pushed onto the stack will be on top of [that is, with a lower memory address, since the stack "grows" downwards] the return address. The Stack Frame --------------- Note that the parameters to the call therefore cannot be POPed from the stack, as this will destroy the saved return address and thus cause the application to crash upon returning from the call [unless, of course, a chosen return address is PUSHed onto the stack before returning from the call]. The logical way to reference these parameters, then, would be as offsets from the stack pointer: [parameter 2] : DWORD esp + 8 [parameter 1] : DWORD esp + 4 [Address of Caller]: DWORD esp ----- "Top" of Stack [esp] ----- In this example, "parameter 1" is the parameter pushed onto the stack last, and "parameter 2" is the parameter pushed onto the stack before parameter 1, as follows: push [parameter 2] push [parameter 1] call [procedure] The problem with referring to parameter as offsets from esp is that esp will change whenever a value is PUSHed onto the stack during the routine. For this reason, it is standard for routines which take parameters to set up a "stack frame". In a stack frame, the base pointer [ebp] is set equal to the stack pointer [esp] at the start of the call; this provides a "base" address from which parameters can be addressed as offsets. It is assumed that the caller had a stack frame also; thus the value of ebp must be preserved in order to prevent causing damage to the caller. The stack frame usually takes the following form: push ebp mov ebp, esp ... [actual code for the routine] ... mov esp, ebp pop ebp This means that once the stack frame has been entered, the stack has the following structure: [parameter 2] : DWORD ebp + 12 [parameter 1] : DWORD ebp + 8 [Address of Caller]: DWORD ebp + 4 [Old Base Pointer] : DWORD ebp ----- Base Pointer [ebp] ----- ----- "Top" of Stack [esp] ----- The use of the base pointer also allows space to be allocated on the stack for local variables. This is done by simply subtracting bytes from esp; since esp is restored when the stack frame is exitted, this space will automatically be deallocated. The local variables are then referred to as *negative* offsets from ebp; these may be EQUed to meaningful symbol names in the source code. A routine that has 3 local DWORD variables would take the following form: Var1 EQU [ebp-4] Var2 EQU [ebp-8] Var3 EQU [ebp-12] ;provide meaningful names for the variables push ebp mov ebp, esp sub esp, 3*4 ;3 DWORDs at 4 BYTEs apiece ... [actual code for the routine] ... mov esp, ebp pop ebp This routine would then have the following stack structure after the allocation of the local variables: [parameter 2] : DWORD ebp + 12 [parameter 1] : DWORD ebp + 8 [Address of Caller]: DWORD ebp + 4 [Old Base Pointer] : DWORD ebp ----- Base Pointer [ebp] ----- [Var1] : DWORD ebp - 4 [Var2] : DWORD ebp - 8 [Var3] : DWORD ebp - 12 ----- "Top" of Stack [esp] ----- The stack frame has can also be used to provide a call trace, as it stores the base pointer of [and thus a pointer to the caller of] the caller. Assume that a program has the following flow of execution: proc_1: push dword call1_p2 push dword call1_p1 call proc_2 ________proc_2: push call2_p1 call proc_3 ________________proc_3: push call3_p1 call proc_4 Upon creation of the stack frame in proc_4, the stack has the following structure: [call1_p2] : DWORD ebp + 36 [call1_p1] : DWORD ebp + 32 [Return Addr of Call1] : DWORD ebp + 28 [Old Base Pointer] : DWORD ebp + 24 ---- Base Pointer of Call 1 ---- [call2_p1] : DWORD ebp + 20 [Return Addr of Call2] : DWORD ebp + 16 [Base Pointer of Call1]: DWORD ebp + 12 ---- Base Pointer of Call 2 ---- [call3_p1] : DWORD ebp + 8 [Return Addr of Call3] : DWORD ebp + 4 [Base Pointer of Call2]: DWORD ebp ----- Base Pointer [ebp] ----- ----- "Top" of Stack [esp] ----- As you can see, for each previous call the return address is [ebp+4], where ebp is the address of the saved base pointer for the call previous to that one. Thus, if one could traverse the history of stack frames as follows: mov eax, ebp ; eax = address of previous ebp mov ecx, 10 ; trace the last 10 calls loop_start: mov ebx, [eax+4] ; ebx = return address for call call print_stack_trace mov eax, [eax] ; step back one stack frame loop loop_start This is exceptionally useful for exception handling; the handling function will be able to print out a stack history to aid debugging. This principle can also be applied in conjunction with debugging code [for example, the Win32 debug API] to create a utility which will trace the calls [in reality, the stack frames of the calls] made by a target. Essentially, this would boil down to the following logic: 1) Breakpoint on changes to EBP 2) On Break, get return address [ebp+4] 3) Get instruction prior to return address 4) Print or log the instruction Note that this can be enhanced to resolves symbol names in the logged CALL instruction, such that local or API address labels [e.g. GetWindowTextA] can be logged rather than just the address itself. The ENTER Instruction --------------------- The ENTER instruction is used to create a stack frame with a single instruction; it is equivalent to the code push ebp mov ebp, esp The ENTER instruction takes a first parameter that specifes the number of bytes to reserve for local variables; an optional second parameter gives the nesting level [0-31] of the current stack frame in the overall program structure. This is often used by high-level languages to save call trace information for error handlers, as it specifies the number of additional [previous] stack frame pointers to save on the stack. The RET Instruction ------------------- Any routine which is accessed by a CALL instruction must be terminated with a return [RET] instruction. As one can see from the operation of the CALL instruction, if you were to attempt to circumvent the RET instruction by JMPing to the retrun address, the stack would still be corrupted. The RET statement is roughly equivalent to the following code: pop EIP Note that the RET must take place after exiting the stack frame in order to avoid corruption of the stack. The LEAVE Instruction --------------------- The LEAVE instruction is used to exit a stack frame created with the ENTER instruction; it is equivalent to the code mov esp, ebp pop ebp The LEAVE instruction takes no parameters and still requires a RET statement to follow it. High-level Language Calling Conventions --------------------------------------- At this point one may wonder what has happened to the parameters pushed onto the stack prior to the call. Are they still on the stack after the RET, or have they been cleared? Since the parameters cannot be POPed from the stack while within the call, they still are on the stack at the RET instruction. At this point the programmer has two options. They can have the caller clean up the stack by adding the number of bytes pushed to esp immediately after the call: push dword param2 push dword param1 call procedure add esp, 2 * 4 ;2 DWORDs at 4 BYTEs apiece Or they can clear the stack by passing to the RET instruction the number of bytes that need to be cleared: push dword param2 push dword param1 call procedure ... procedure: push ebp mov ebp, esp ... mov esp, ebp pop ebp ret 8 ;2 DWORDs at 4 BYTEs apiece Which method is chosen is left up to the programmer; however, when writing a library or API, one must make clear who is responsible for cleaning up the stack. In addition, when interfacing with high-leve languages, one also has to make clear which order the parameters are to be pushed in. For this reason there are calling conventions for the high-level languages. The C calling convention is used to interface with the C and C++ programming languages; it is used in the standard C library and in Unix APIs. It pushes the parameters from right to left, and does not clean up the stack upon return from the call. A call to a C-style routine would look as follows: ;corresponds to the C code ;procedure(param1, param2) push dword param2 push dword param1 call procedure add esp, 8 A C-style routine would have the following structure: push ebp mov ebp, esp ... mov esp, ebp pop ebp ret The Pascal calling convention is used interface with the Pascal, BASIC, and Fortran programming languages; it is used in the Win16 API. It pushes the parameters from left to right, and cleans up the stack upon return from the call; as such it is the opposite of the C convention. A call to a Pascal routine would look as follows: ;corresponds to the C code ;procedure(param1, param2) push dword param1 push dword param2 call procedure A Pascal-style routine would have the following structure: push ebp mov ebp, esp ... mov esp, ebp pop ebp ret 8 ;clear the 2 dword parameters The Stdcall ["standard call" or __stdcall] calling convention is a combination of the C and Pascal conventions; it is used in the Win32 API. It pushes the parameters from right to left, and cleans the stack upon return from the call. A call to a Stdcall routine would look as follows: ;corresponds to the C code ;procedure(param1, param2) push dword param2 push dword param1 call procedure A Stdcall-style routine would have the following structure: push ebp mov ebp, esp ... mov esp, ebp pop ebp ret 8 There is also a Register calling convention [also called "fastcall"] which uses registers rather than the stack to pass parameters. The first parameter is passed in eax, the second in EDX, and the third in EBX; subsequent parameters are passed via the stack. A call to a Register routine would look as follows: ;corresponds to the C code ;procedure(param1, param2, param3) mov eax, param1 mov edx, param2 mov ebx, param3 call procedure Note that there is no defined standard method of clearing the stack ro the Register convention; however most implemntations clear the stack in the Pascal style. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Define Your Memory by Alan Baylis [I am going to preface this article with a brief note, since it is not covering assembly language per se, but rather a utility that will be of use to asm coders. The author sums it up well in his original email to me: "Define is a new type of assembler/disassembler that does not use source code. The program reads the byte values in memory and checks a library to find a definition that describes the byte values it reads. The library can be added to and is used as a permanent macro list to write instuctions, functions, etc to memory. Most assemblers also use standard 3 character mnemonics to descibe the instruction set, however, with Define you can rename the instructions and your own macros to anything and up to 250 characters." Sounds pretty promising. _m ] For the x86 series of processor I have been working on a new type of assembler and have written a program called Define. The program could be called a sketch of what a future version might be like. The program is fully workable but suffers from a few limitations, the first is that it is written in QBASIC which may be a blow to devoted machine coders, and the second is that it can only comfortably use about three hundred definitions (Definitions are like a library of machine code macros and I'll discuss them more fully later) and a third limitation, not to its functionality, is that the program doesn't have a quick mouse and menu driven interface, but I'm working on it. I liked the idea of macros and saw the neccessity for using them so that I and others don't have to "reinvent the wheel" as it has been put, but I wanted a way to see the machine code instructions and the byte values that made up the macro. This can't be done through using source code as the finished code is generated at the discretion of the compilers authors and requires a debugger to verify its content. To make what was originally intended to be a debugger but without the source code I decided to make a program that could read memory and interpret the byte values it finds into their mnemonic equivalents or better (much like a debugger), so that while reading memory, if the program found the byte value 205 followed by the value 5 it would display "INT 5". To do this I needed what I termed a 'definiton' which included the byte values that make up an instruction or small macro and included a description or name for the function they perform. Unlike what I had done with a previous assembler I decided to put the definitions in a separate file rather than include them as data within the main program, this allowed for the addition or removal of future definitions. I then quickly realised that since these definitions contained the byte values of an instruction, then they could also be used to write the bytes into memory. I added functions to save and load programs as well as functions to manipulate the definition file and the program was underway. I found while writing the definitions for the instruction set that it would be good (and necessary) if the program could read an instruction even if one of the bytes is unknown or variable; I decided to call these bytes undefined bytes, so that if the program found the number 205 it would display something like "Interrupt call" regard- less of what number followed. While reading memory I also wanted a way to exclude data areas from being interpreted into definitions, so I added a new definition type called addresses which contain the address of the first and last bytes of a data area and a name to describe the data area. If these are turned on in the program then they are used instead of the normal definitions when reading that part of memory. To then take Define closer to being an assembler rather than a debugger I also included labels that label memory addresses and the destination of jump and branch instructions. I envision that a future version of Define written in machine code or a similar program will have a pop up list of definitons and use a point and click method of writing the code as opposed to the current method of scrolling through them from a different page. The future version will also need to be able to handle thousands of definitions as opposed to the few hundred it can use at a time now, in order to accommodate situations such as the following: To call the interrupt 21h,9 which prints a string it is necessary to put the function number 9 in AH and the address of the string in the registers DS:DX and then call the interrupt, MOV AH,9 MOV DX,address INT 21h however it is also valid to put the number 9 in AH after the address of the string has been put in DS:DX, MOV DX,address MOV AH,9 INT 21h To make a definition for this interrupt at least two definitions will need to be made and therefore a larger definition file. This also doesn't account for the situation in which the number 9 may have been filled three instructions earlier and is assumed to be correct at the time when the interrupt is called, in this case only the definitions for the instructions will be seen and not a definition for the interrupt. One of the best aspects to Define in my book is that the memory can be viewed according to a persons level of understanding (or will be as the definitions are written,) for example the program is able to only show definitions of a certain level and no other. I have chosen to represent the level of a definition by its color, I have used blue (1) for the lowest level which are the instruction set definitions and then green (2) for the next level which are the DOS, BIOS, etc definitions and then magenta (3) for the next level which may be definitions to clear the screen and print the date combined and so on, so that a person who knows little about machine code may set the maximum definition color to red (4) and still be able to write a program using Define. The advantage for those who know machine code is that they need not be restricted to only a high level definition, by turning the observance of the color off they can press the letter B when viewing a high level definition and see the lower level definitions that make up the higher one. By repeatedly pressing B they can view the program as level 1 (blue) or even as the byte values themselves. The most radical departure from most assemblers is that when writing a program the program is composed in memory, the byte values of the definitions are written directly to an unused or reserved area of memory where they can be further altered directly while reading memory. This could also be said to be the most dangerous method as it can easily lead to the accidental writing of other areas of memory, while this is true I have also found a benefit, if Define is stopped and then restarted the program being written will still be in memory without having been saved (depending on where in memory the program is being written.) The maker of a violin, while demonstrating it, must have said at one time or another "A good violinist could really show you how to play it", I too like the maker of a violin am sure there are better definition writers than myself. To become a high level language the high level definitions need to be written and I ask any person who has a passion for writing hand written code to send me a definition or two to include in the definition file. You can download Define from my homepage at http://members.net-tech.com.au/alaneb/default.htm and there is a step by step guide to using the program in the zip file called manual.doc. Please send any definitions or reponse to Alan at alien1_3@excite.com ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Writing a Boot Sector in A86 by Jan Verhoeven I have been coding for FreeDOS some time, but that is a C project and I rather hate C. It is so clumsy. That's also why I always code in A86 assembly language. The "No Red Tape" assembler that makes life a lot easier for programmers. A86 is good. The debugger (D86) could be better, but not too much. I registered my version and I want to encourage everyone to follow my lead. The software is good enough to pay for it. And it ensures proper development of the software. If you can spare 20 bucks a month for the ISP, you should also spend this on quality software. During the last two years I have been submiutting bugs to Isaacson and all of them have been fixed in the latest version (4.03). Besides A86 being the best assembler around, it has some idiosyncracies to which some people need to get used to. Plus my personal preferences, which might add to that... - When I refer to a memory location I use square brackets. - I use single quotes for texts - I use most of the A86 features. Some of the A86 features are: - very powerful macro language - numbers starting with a ZERO are ALWAYS hex, no matter how they end - easy IF statements to reduce nonsense labelnames - local labels, like below: only two local labels. I started out on the Z-8000, back in 1981, switched to the Z-80, Z-8, 8086, PIC 16Cxx, some 8051 (Barffff), some 68K (yummie yummie). Mainly in ASM and else in Modula-2. I have some really cool and useful routines lying around for DOS. And I'm gonna share them with the world. The following code is a bootsector which can be used for noon-bootable disks. In this case for a 1.44 Mb floppy disk. You could use it to make a commercial out of every non-bootable disk. First the code: ----- Code file ------------------------------------------------- name flopnb title Floppy disk boot sector, non-bootable, 1.44 Mb page 80, 120 ; version 1.0 : It works : OK 12-12-1998 lf = 10 cr = 13 org 0 jmp short main ; this is critical! nop ; and this too! ; ---------------------- OEMname db 'StupiDOS' BpS dw 512 ; bytes per sector SpA db 1 ; sectors per allocation unit (=cluster) ResSect dw 1 ; reserved sectors, starting from sector 0 NrFats db 2 ; number of FAT's on this disk FiR dw 224 ; number of entries in ROOT directory Total dw 2880 ; number of sectors per disk ToM db 0F0 ; Type of Media SpF dw 9 ; Sectors per Fat SpT dw 18 ; sectors per Track Heads dw 2 ; number of heads Hidden dw 0, 0 ; Hidden sectors GrandTot dd 0 ; total for disks over 32 Mb IntId db 0, 0 BootSign db 029 ; extended boot signature VolumeID dd 0566E614A ; serial number ... DiskLabl db 'DOS is MINE' ; volume label FATtype db 'FAT-12 ' ; FAT type db 'VeRsIoN=1.0', 0 ; for version control only ; ---------------------- L1: push si ; stack up return address ret ; and jump to it print: pop si ; this is the first character mov bx, 0 ; video page 0 L0: lodsb ; get token cmp al, 0 ; end of string? je L1 ; if so, exit mov ah, 0E ; else print it int 010 ; via TTY mode jmp L0 ; until done ; ---------------------- main: cld ; init direction flag cli ; take care of 1 faulty batch of 88's in 1980 mov ax, 07C0 ; this is the segmentvalue at start mov ds, ax ; store it in DS, ES mov es, ax mov ax, 0 ; clear ax ... mov ss, ax ; ... to prime the SS register mov sp, 07C00 ; set stackpointer sti ; OK, interrupts may come again call print ; show that message db cr db 'This is not a bootable floppy. ' db 'Please strike any key to reboot.', cr, lf db 'This floppy disk is formatted by FreeDOS', cr, lf, lf db 'Please visit us at www.freedos.org', cr, lf, 0 L0: mov ah, 1 ; wait for keypress by ... int 016 ; ... interrogating keyboard jz L0 ; if no key pressed, loop back mov ax, 0 ; else address system variables mov es, ax ; in order to ... es mov w [0472], 01234 ; signal: NO POST and go on ... jmp 0FFFF:0000 ; with the next reboot org 01FE ; look for the dotted line and ... db 055, 0AA ; ... don't forget to sign! ------------------------------------------------- Code file ----- The first three lines are straightforward: name, title and page. Not much to tell about that. Then some version info for the programmer, some equates and the ORG statement. If no ORG is supplied, A86 will assume it is ORG 0100. I ordered an ORG 0, which means several things: - start assembly at address 0 - the output file will be called *.BIN Bootsectors must start with some particular bytes. Therefore the first three bytes need to be either a short jump, a variable offset plus a NOP. Or a (long) jump without a NOP. At offset 03 of the bootsector starts the DPB (Disk Parameter Block) which tells the OS what kind of disk this is. It starts off with an OEM name. Please put ASCII in there, or virus scanners might trip on it with a "Bloodhound warning". After the description of the geometry of this disk, I included an extended boot signature, since we have ample room left. It contains Volume ID, Disk Label, and FAT-type strings. The PRINT subroutine is a nice one. It will print the ASCIIZ string that follows it. This is quite a handy routine since you can simply change messages without having to worry about the address and length of the actual message. Print is called like this: call print db 'Hello World', cr, lf, 0 ... Print takes the "return address" off the stack. This of course is no return address but the address of the message. What follows is easy: - get next character - IF (non-zero) print character ELSE leave loop ENDIF - the current si pointer is the actual return address... So we push it - and return to caller. Perhaps a jmp si could be possible too, but I like clear code, in most cases. If you need obfuscated code, switch to C. :) The actual program is very simple. It just sets up a stack and the segment registers, and then prints that it will do nothing. Gee, what a life... After the message we wait for a key and next signal: - fast reboot - jump to the reboot vector Whatever there will be between end of code and offset 01FE is not relevant (it could be your ad) but the last two bytes of the boot sector must be a valid boot signature. That's it. With this code you can make your own custom non-bootsector. I hope this software has also shown that linking and assuming are supported by A86, but certainly not necessary. Also, this software does not rely on any HLL calls. It's just assembly language as it should be. I want to remark that this software is Open Source, according to the rules of the GNU GPL. Make sure you understand these rules before embedding this routine in your own software. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE A Basic Virus Writing Primer by Chili What horror must the ignorant victim undergo as it becomes aware of a being that lives inside its own body, growing ever stronger, reproducing itself until its host, unable to bear more finally colapses and dies an horrible death. What panic it must feel, knowing nothing can be done in time to avoid such a terrible fate. A predator so tiny, that unsuspectedly it spreads from one host to another, by so rapidly infecting millions. An organism, so utterly resourceful and small, that it stays most of the time undetectable, breeding in the shadows. Computer viruses aren't much different from their biological counterpart, but instead of infecting cells they infect files and boot sectors. In this article I'll try to explain the basics of file viruses, more specifically runtime (aka direct action) COM infectors. This will cover most simple search and replication methods used and is only to be considered as an introduction to virus writing. After some thought I've decided not to include any full source code for a working virus, since anyone with half a brain and a somewhat mediocre knowledge of assembly can easily build a virus out of the pieces of code that will be presented. Furthermore it's not my wish to increase the number of viruses in the wild, thing that would undoubtedly happen by the hands of some I-have-no-brain-and-can't-program-hellspawn bent on random destruction. Anyway, on with the article... Some Sort Of 'Programming Virii Safely' Guide --------------------------------------------- The only really safe way to program viruses is to know what you're doing and understand at every time how the virus is behaving. If you test a virus on your own machine without fully comprehending its ins and outs, then you will most likely have your system trashed. It would be best if you had a second computer just for this purpose, since a buggy programming can lead to a lot of crashes and general havoc. If not, a Ramdrive can be created and a Subst can be done, so that all accesses to physical drives are redirected to the virtual one. Assuming that you want your Ramdrive to have 512-byte sectors, a limit of 1024 entries and to allocate 2048K of extended memory, you must add this line to your CONFIG.SYS: DEVICE=C:\DOS\RAMDRIVE.SYS 2048 512 1024 /E Then you must copy COMMAND.COM and SUBST.EXE to the Ramdrive so that DOS won't hang and also in order for you to be able to delete all redirections when done. And to associate all physical drives to the newly created virtual drive (and assuming that it is D: and all your drives are A: and C:) you should do: SUBST A: D:\ SUBST C: D:\ Of course this last method isn't perfect. You should always know how to completely remove a virus before running it, or you'll end cleaning up the mess for quite some time. Just use common sense. For example, if you're writing a virus aimed at a specific file type, all you have to do is copy all files of that type you do not wish to be infected to a different extension and when you're done testing just switch those files back to their original extension. While testing you should also place breakpoints and warning messages throughout the code, so that you know at all times what the virus is doing as well as it will help you debugging it. Also you should program and test different routines separately as it will reduce complexity and bug proneness. Lastly the use of memory and disk mapping/editing utilities, a set of good anti-virus and most important the use of backups is encouraged, so that you can keep track of things and are able to restore your system in case something goes wrong. In case things get really out of hand you should always have a clean "rescue disk" which you should create by doing a FORMAT A: /S /U and then copying into it some useful DOS files like FORMAT.COM, UNFORMAT.COM, FDISK.EXE, SYS.COM, MEM.EXE, ATTRIB.EXE, DEBUG.EXE, CHKDSK.EXE, SUBST.EXE, a text editor just in case and whichever other files you may find useful. Also an anti-v